Tutorial Brief

In this tutorial we will cover the basics of using Google Custom Search to search the Interner.

Links:

Video Tutorial:



In [1]:

    
import json
import requests
import pandas as pd

Understanding Google Custom Search

Google Custom Search replaces the depreciated Google Search API. It is designed to search one or more website and to embedded with in the website.

There is still an options to search the complete web. This options combined with no specified website to search retuen results which are very close to what you get when you search Google. The difference in results is due to personalized and localized search results that Google search returns.



In [2]:

    
key = ""
cx = ""

Parameter name	Value	Description
Required parameters
`q`	`string`	The search expression.
Optional parameters
`c2coff`	`string`	Enables or disables Simplified and Traditional Chinese Search. The default value for this parameter is `0` (zero), meaning that the feature is enabled. Supported values are: `1`: Disabled `0`: Enabled (default)
`cr`	`string`	Restricts search results to documents originating in a particular country. You may use Boolean operators in the `cr` parameter's value. Google Search determines the country of a document by analyzing: the top-level domain (TLD) of the document's URL the geographic location of the Web server's IP address See the Country Parameter Values page for a list of valid values for this parameter.
`cref`	`string`	The URL of a linked custom search engine specification to use for this request. Does not apply for Google Site Search If both `cx` and `cref` are specified, the `cx` value is used
`cx`	`string`	The custom search engine ID to use for this request. If both `cx` and `cref` are specified, the `cx` value is used.
`dateRestrict`	`string`	Restricts results to URLs based on date. Supported values include: `d[number]`: requests results from the specified number of past days. `w[number]`: requests results from the specified number of past weeks. `m[number]`: requests results from the specified number of past months. `y[number]`: requests results from the specified number of past years.
`exactTerms`	`string`	Identifies a phrase that all documents in the search results must contain.
`excludeTerms`	`string`	Identifies a word or phrase that should not appear in any documents in the search results.
`fileType`	`string`	Restricts results to files of a specified extension. A list of file types indexable by Google can be found in Webmaster Tools Help Center.
`filter`	`string`	Controls turning on or off the duplicate content filter. See Automatic Filtering for more information about Google's search results filters. Note that host crowding filtering applies only to multi-site searches. By default, Google applies filtering to all search results to improve the quality of those results. Acceptable values are: "`0`": Turns off duplicate content filter. "`1`": Turns on duplicate content filter.
`gl`	`string`	Geolocation of end user. The `gl` parameter value is a two-letter country code. The `gl` parameter boosts search results whose country of origin matches the parameter value. See the Country Codes page for a list of valid values. Specifying a `gl` parameter value should lead to more relevant results. This is particularly true for international customers and, even more specifically, for customers in English- speaking countries other than the United States.
`googlehost`	`string`	The local Google domain (for example, google.com, google.de, or google.fr) to use to perform the search.
`highRange`	`string`	Specifies the ending value for a search range. Use `lowRange` and `highRange` to append an inclusive search range of `lowRange...highRange` to the query.
`hl`	`string`	Sets the user interface language. Explicitly setting this parameter improves the performance and the quality of your search results. See the Interface Languages section of Internationalizing Queries and Results Presentation for more information, and Supported Interface Languages for a list of supported languages.
`hq`	`string`	Appends the specified query terms to the query, as if they were combined with a logical `AND` operator.
`imgColorType`	`string`	Returns black and white, grayscale, or color images: `mono`, `gray`, and `color`. Acceptable values are: "`color`": color "`gray`": gray "`mono`": mono
`imgDominantColor`	`string`	Returns images of a specific dominant color. Acceptable values are: "`black`": black "`blue`": blue "`brown`": brown "`gray`": gray "`green`": green "`pink`": pink "`purple`": purple "`teal`": teal "`white`": white "`yellow`": yellow
`imgSize`	`string`	Returns images of a specified size. Acceptable values are: "`huge`": huge "`icon`": icon "`large`": large "`medium`": medium "`small`": small "`xlarge`": xlarge "`xxlarge`": xxlarge
`imgType`	`string`	Returns images of a type. Acceptable values are: "`clipart`": clipart "`face`": face "`lineart`": lineart "`news`": news "`photo`": photo
`linkSite`	`string`	Specifies that all search results should contain a link to a particular URL
`lowRange`	`string`	Specifies the starting value for a search range. Use `lowRange` and `highRange` to append an inclusive search range of `lowRange...highRange` to the query.
`lr`	`string`	Restricts the search to documents written in a particular language (e.g., `lr=lang_ja`). Acceptable values are: "`lang_ar`": Arabic "`lang_bg`": Bulgarian "`lang_ca`": Catalan "`lang_cs`": Czech "`lang_da`": Danish "`lang_de`": German "`lang_el`": Greek "`lang_en`": English "`lang_es`": Spanish "`lang_et`": Estonian "`lang_fi`": Finnish "`lang_fr`": French "`lang_hr`": Croatian "`lang_hu`": Hungarian "`lang_id`": Indonesian "`lang_is`": Icelandic "`lang_it`": Italian "`lang_iw`": Hebrew "`lang_ja`": Japanese "`lang_ko`": Korean "`lang_lt`": Lithuanian "`lang_lv`": Latvian "`lang_nl`": Dutch "`lang_no`": Norwegian "`lang_pl`": Polish "`lang_pt`": Portuguese "`lang_ro`": Romanian "`lang_ru`": Russian "`lang_sk`": Slovak "`lang_sl`": Slovenian "`lang_sr`": Serbian "`lang_sv`": Swedish "`lang_tr`": Turkish "`lang_zh-CN`": Chinese (Simplified) "`lang_zh-TW`": Chinese (Traditional)
`num`	`unsigned integer`	Number of search results to return. Valid values are integers between 1 and 10, inclusive.
`orTerms`	`string`	Provides additional search terms to check for in a document, where each document in the search results must contain at least one of the additional search terms.
`relatedSite`	`string`	Specifies that all search results should be pages that are related to the specified URL.
`rights`	`string`	Filters based on licensing. Supported values include: `cc_publicdomain`, `cc_attribute`, `cc_sharealike`, `cc_noncommercial`, `cc_nonderived,` and combinations of these.
`safe`	`string`	Search safety level. Acceptable values are: "`high`": Enables highest level of SafeSearch filtering. "`medium`": Enables moderate SafeSearch filtering. "`off`": Disables SafeSearch filtering. (default)
`searchType`	`string`	Specifies the search type: `image`. If unspecified, results are limited to webpages. Acceptable values are: "`image`": custom image search.
`siteSearch`	`string`	Specifies all search results should be pages from a given site.
`siteSearchFilter`	`string`	Controls whether to include or exclude results from the site named in the `siteSearch` parameter. Acceptable values are: "`e`": exclude "`i`": include
`sort`	`string`	The sort expression to apply to the results.
`start`	`unsigned integer`	The index of the first result to return.

Prepare The request



In [3]:

    
url = "https://www.googleapis.com/customsearch/v1"
parameters = {"q": "halloween",
              "cx": cx,
              "key": key,
              }

Make the request



In [4]:

    
page = requests.request("GET", url, params=parameters)

Process Results



In [5]:

    
results = json.loads(page.text)

Inspecting Results



In [6]:

    
results.keys()









    Out[6]:





[u'kind', u'url', u'items', u'context', u'queries', u'searchInformation']

Inspecting Search Meta Data



In [7]:

    
results["kind"]









    Out[7]:





u'customsearch#search'



In [8]:

    
results["url"]









    Out[8]:





{u'template': u'https://www.googleapis.com/customsearch/v1?q={searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe={safe?}&cx={cx?}&cref={cref?}&sort={sort?}&filter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq={hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter={siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms={excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&relatedSite={relatedSite?}&dateRestrict={dateRestrict?}&lowRange={lowRange?}&highRange={highRange?}&searchType={searchType}&fileType={fileType?}&rights={rights?}&imgSize={imgSize?}&imgType={imgType?}&imgColorType={imgColorType?}&imgDominantColor={imgDominantColor?}&alt=json',
 u'type': u'application/json'}



In [9]:

    
len(results["items"])









    Out[9]:





10



In [10]:

    
results["queries"]









    Out[10]:





{u'nextPage': [{u'count': 10,
   u'cx': u'013297447421619698040:gnhhn4xeyns',
   u'inputEncoding': u'utf8',
   u'outputEncoding': u'utf8',
   u'safe': u'off',
   u'searchTerms': u'halloween',
   u'startIndex': 11,
   u'title': u'Google Custom Search - halloween',
   u'totalResults': u'117000000'}],
 u'request': [{u'count': 10,
   u'cx': u'013297447421619698040:gnhhn4xeyns',
   u'inputEncoding': u'utf8',
   u'outputEncoding': u'utf8',
   u'safe': u'off',
   u'searchTerms': u'halloween',
   u'startIndex': 1,
   u'title': u'Google Custom Search - halloween',
   u'totalResults': u'117000000'}]}



In [11]:

    
results["searchInformation"]









    Out[11]:





{u'formattedSearchTime': u'0.53',
 u'formattedTotalResults': u'117,000,000',
 u'searchTime': 0.531343,
 u'totalResults': u'117000000'}

Inspecting a single result



In [12]:

    
results["items"][0]









    Out[12]:





{u'cacheId': u'0b5Oki2-f4EJ',
 u'displayLink': u'en.wikipedia.org',
 u'formattedUrl': u'en.wikipedia.org/wiki/Halloween',
 u'htmlFormattedUrl': u'en.wikipedia.org/wiki/<b>Halloween</b>',
 u'htmlSnippet': u'<b>Halloween</b> or Hallowe&#39;en <sup>6]</sup> also known as Allhalloween, All Hallows&#39; Eve, or All <br>\nSaints&#39; Eve, is a yearly celebration observed in a number of countries on 31&nbsp;...',
 u'htmlTitle': u'<b>Halloween</b> - Wikipedia, the free encyclopedia',
 u'kind': u'customsearch#result',
 u'link': u'http://en.wikipedia.org/wiki/Halloween',
 u'pagemap': {u'cse_image': [{u'src': u"http://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Jack-o'-Lantern_2003-10-31.jpg/240px-Jack-o'-Lantern_2003-10-31.jpg"}],
  u'cse_thumbnail': [{u'height': u'188',
    u'src': u'https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcQDphQXJ1JeGZS30rhMAfcAhklES25AKZ-9TS9YwUdycnIqLa7W96Maxjg',
    u'width': u'192'}],
  u'event': [{u'dtstart': u'2014-10-31',
    u'summary': u'First day of Allhallowtide'}],
  u'hcalendar': [{u'dtstart': u'2014-10-31',
    u'summary': u'First day of Allhallowtide'}]},
 u'snippet': u"Halloween or Hallowe'en 6] also known as Allhalloween, All Hallows' Eve, or All \nSaints' Eve, is a yearly celebration observed in a number of countries on 31\xa0...",
 u'title': u'Halloween - Wikipedia, the free encyclopedia'}

Process Results Into a Pandas Data Frame



In [13]:

    
def process_search(results):
    link_list = [item["link"] for item in results["items"]]
    df = pd.DataFrame(link_list, columns=["link"])
    df["title"] = [item["title"] for item in results["items"]]
    df["snippet"] = [item["snippet"] for item in results["items"]]
    return df
df = process_search(results)
df









    Out[13]:






  
    
      
      link
      title
      snippet
    
  
  
    
      0
                  http://en.wikipedia.org/wiki/Halloween
            Halloween - Wikipedia, the free encyclopedia
       Halloween or Hallowe'en 6] also known as Allha...
    
    
      1
                 http://www.history.com/topics/halloween
       Halloween - Videos, Facts, Origin & Meaning - ...
       Find out more about the history of Halloween, ...
    
    
      2
                    http://www.imdb.com/title/tt0077651/
                                 Halloween (1978) - IMDb
       Halloween II. A Nightmare on Elm Street. Hallo...
    
    
      3
                               http://www.halloween.com/
                          Halloween 2014 | Halloween.com
       Halloween fun on the internet, the one source ...
    
    
      4
                             http://halloweenmovies.com/
                                        HalloweenMovies™
       Tickets now on sale for Halloween - ODEON Cine...
    
    
      5
              http://www.loc.gov/folklife/halloween.html
       Halloween: The Fantasy and Folklore of All Hal...
       Beginnings of Halloween celebration; First ori...
    
    
      6
                         http://www.spirithalloween.com/
       Halloween Costumes - Childrens & Adult Hallowe...
       Spirit Halloween - Halloween Stores nationwide...
    
    
      7
                    http://www.cdc.gov/family/halloween/
       CDC - Halloween Health and Safety - Family Health
       5 days ago ... Fall celebrations like Hallowee...
    
    
      8
       http://www.partycity.com/category/halloween+co...
       Halloween Costumes for Kids & Adults - Costume...
       Halloween costumes for all ages and sizes. Sho...
    
    
      9
              http://www.overkillsoftware.com/halloween/
                           PAYDAY HALLOWEEN SPECIAL 2014

Getting Results from more pages

Use "start" parameter to skip results from previous pages. To get the next "start" index look it up in "queries.nextPage[0].startIndex"



In [14]:

    
next_index = results["queries"]["nextPage"][0]["startIndex"]
search_terms = results["queries"]["nextPage"][0]["searchTerms"]

url = "https://www.googleapis.com/customsearch/v1"
parameters = {"q": search_terms,
              "cx": cx,
              "key": key,
              "start": next_index
              }



In [15]:

    
page = requests.request("GET", url, params=parameters)
results = json.loads(page.text)



In [16]:

    
def process_search(results):
    link_list = [item["link"] for item in results["items"]]
    df = pd.DataFrame(link_list, columns=["link"])
    df["title"] = [item["title"] for item in results["items"]]
    df["snippet"] = [item["snippet"] for item in results["items"]]
    return df
temp_df = process_search(results)
df = pd.concat([df, temp_df], ignore_index=True)
df









    Out[16]:






  
    
      
      link
      title
      snippet
    
  
  
    
      0 
                  http://en.wikipedia.org/wiki/Halloween
            Halloween - Wikipedia, the free encyclopedia
       Halloween or Hallowe'en 6] also known as Allha...
    
    
      1 
                 http://www.history.com/topics/halloween
       Halloween - Videos, Facts, Origin & Meaning - ...
       Find out more about the history of Halloween, ...
    
    
      2 
                    http://www.imdb.com/title/tt0077651/
                                 Halloween (1978) - IMDb
       Halloween II. A Nightmare on Elm Street. Hallo...
    
    
      3 
                               http://www.halloween.com/
                          Halloween 2014 | Halloween.com
       Halloween fun on the internet, the one source ...
    
    
      4 
                             http://halloweenmovies.com/
                                        HalloweenMovies™
       Tickets now on sale for Halloween - ODEON Cine...
    
    
      5 
              http://www.loc.gov/folklife/halloween.html
       Halloween: The Fantasy and Folklore of All Hal...
       Beginnings of Halloween celebration; First ori...
    
    
      6 
                         http://www.spirithalloween.com/
       Halloween Costumes - Childrens & Adult Hallowe...
       Spirit Halloween - Halloween Stores nationwide...
    
    
      7 
                    http://www.cdc.gov/family/halloween/
       CDC - Halloween Health and Safety - Family Health
       5 days ago ... Fall celebrations like Hallowee...
    
    
      8 
       http://www.partycity.com/category/halloween+co...
       Halloween Costumes for Kids & Adults - Costume...
       Halloween costumes for all ages and sizes. Sho...
    
    
      9 
              http://www.overkillsoftware.com/halloween/
                           PAYDAY HALLOWEEN SPECIAL 2014
                                                        
    
    
      10
       http://www.popsugar.com/moms/Ultimate-Hallowee...
                Ultimate Halloween Guide | POPSUGAR Moms
       Download our Halloween app! ... Felicity to Sc...
    
    
      11
       http://www.yandy.com/Shopping/products/categor...
                                 Sexy Halloween Costumes
       Sexy Halloween costumes up to 75% off! Free sh...
    
    
      12
             http://www.grandinroad.com/halloween-haven/
       Halloween Decorations - Halloween Decor - Gran...
       Shop Grandin Road's collection of outdoor Hall...
    
    
      13
       http://www.amazon.com/Halloween-Donald-Pleasen...
       Amazon.com: Halloween: Donald Pleasence, Jamie...
       Halloween stars Jamie Lee Curtis (A Fish Calle...
    
    
      14
                  http://www.instructables.com/halloween
                                 Halloween Instructables
       DIY Halloween costumes for adults, kids and pe...
    
    
      15
       http://www.huffingtonpost.com/2014/10/27/hallo...
       These Are The Most Googled Halloween Costumes ...
       1 day ago ... Some states are making a topical...
    
    
      16
             http://www.wilstar.com/holidays/hallown.htm
       Halloween - The History, Traditions, and Custo...
       The history of Halloween and its customs start...
    
    
      17
                   https://www.etsy.com/browse/halloween
              Halloween Costumes, Decor & Treats on Etsy
       Find one-of-a-kind costumes for kids, adults a...
    
    
      18
       https://nrf.com/media/press-releases/record-nu...
       Record Number of Americans to Buy Halloween Co...
       Sep 24, 2014 ... More than two-thirds (67.4%) ...
    
    
      19
       http://www.accuweather.com/en/weather-news/eas...
       Halloween Forecast: Snow, Cold to Blast Northe...
       1 day ago ... A shocking blast of cold air and...

	link	title	snippet
0	http://en.wikipedia.org/wiki/Halloween	Halloween - Wikipedia, the free encyclopedia	Halloween or Hallowe'en 6] also known as Allha...
1	http://www.history.com/topics/halloween	Halloween - Videos, Facts, Origin & Meaning - ...	Find out more about the history of Halloween, ...
2	http://www.imdb.com/title/tt0077651/	Halloween (1978) - IMDb	Halloween II. A Nightmare on Elm Street. Hallo...
3	http://www.halloween.com/	Halloween 2014 \| Halloween.com	Halloween fun on the internet, the one source ...
4	http://halloweenmovies.com/	HalloweenMovies™	Tickets now on sale for Halloween - ODEON Cine...
5	http://www.loc.gov/folklife/halloween.html	Halloween: The Fantasy and Folklore of All Hal...	Beginnings of Halloween celebration; First ori...
6	http://www.spirithalloween.com/	Halloween Costumes - Childrens & Adult Hallowe...	Spirit Halloween - Halloween Stores nationwide...
7	http://www.cdc.gov/family/halloween/	CDC - Halloween Health and Safety - Family Health	5 days ago ... Fall celebrations like Hallowee...
8	http://www.partycity.com/category/halloween+co...	Halloween Costumes for Kids & Adults - Costume...	Halloween costumes for all ages and sizes. Sho...
9	http://www.overkillsoftware.com/halloween/	PAYDAY HALLOWEEN SPECIAL 2014